Finding Similar Regions in Many Sequences

نویسندگان

Ming Li

Bin Ma

Lusheng Wang

چکیده

1 1 Some of the results in this paper have appeared as a part of an extended abstract presented in ''Proc. Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core of many molecular biology problems. Assume that we are given n DNA sequences s 1 , ..., s n. The Consensus Patterns problem, which has been widely studied in bioinformatics research, in its simplest form, asks for a region of length L in each s i , and a median string s of length L so that the total Hamming distance from s to these regions is minimized. We show that the problem is NP-hard and give a polynomial time approximation scheme (PTAS) for it. We then present an efficient approximation algorithm for the consensus pattern problem under the original relative entropy measure. As an interesting application of our analysis, we further obtain a PTAS for a restricted (but still NP-hard) version of the important consensus alignment problem allowing at most constant number of gaps, each of arbitrary length, in each sequence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Full-length Characterization of S1 Gene of Iranian QX Avian Infectious Bronchitis Virus Isolates, 2015

Background and Aims: Avian infectious bronchitis (IB) has prevalent in the most chicken farms during recent years, in spite of the IB vaccination program which has been widely performed in Iran. To better understand the molecular epidemiology of IBV in Iran, the full length sequences of S1 gene of Iranian QX IBVs were determined and phylogenetic analysis was done using some sequences of IBV. M...

متن کامل

Designing Of Degenerate Primers-Based Polymerase Chain Reaction (PCR) For Amplification Of WD40 Repeat-Containing Proteins Using Local Allignment Search Method

Degenerate primers-based polymerase chain reaction (PCR) are commonly used for isolation of unidentified gene sequences in related organisms. For designing the degenerate primers, we propose the use of local alignment search method for searching the conserved regions long enough to design an acceptable primer pair. To test this method, a WD40 repeat-containing domain protein from Beauveria bass...

متن کامل

Mining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM

Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...

متن کامل

MAP2: multiple alignment of syntenic genomic sequences

We describe a multiple alignment program named MAP2 based on a generalized pairwise global alignment algorithm for handling long, different intergenic and intragenic regions in genomic sequences. The MAP2 program produces an ordered list of local multiple alignments of similar regions among sequences, where different regions between local alignments are indicated by reporting only similar regio...

متن کامل

Cross chromosomal similarity for DNA sequence compression

Current DNA compression algorithms work by finding similar repeated regions within the DNA sequence and then encoding these regions together to achieve compression. Our study on chromosome sequence similarity reveals that the length of similar repeated regions within one chromosome is about 4.5% of the total sequence length. The compression gain is often not high because of these short lengths....

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

J. Comput. Syst. Sci.

دوره 65 شماره

صفحات -

تاریخ انتشار 2002

Finding Similar Regions in Many Sequences

نویسندگان

چکیده

منابع مشابه

Full-length Characterization of S1 Gene of Iranian QX Avian Infectious Bronchitis Virus Isolates, 2015

Designing Of Degenerate Primers-Based Polymerase Chain Reaction (PCR) For Amplification Of WD40 Repeat-Containing Proteins Using Local Allignment Search Method

Mining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM

MAP2: multiple alignment of syntenic genomic sequences

Cross chromosomal similarity for DNA sequence compression

عنوان ژورنال:

اشتراک گذاری